152

Applications in Computer Vision

projecting the real-valued (32-bit) variable x onto a set as

Q = {a1, a2, · · · , an} ,

(6.1)

where Q is a discrete set and n is the bit size of the set Q. For example, n is set as 216 when

performing 16-bit quantization. Then, we define the projection of xR onto the set Q as

PRQ(x) =

a1,

x < a1+a2

2

· · ·

ai,

ai1+ai

2

x < ai+ai+1

2

· · ·

an,

an1+an

2

x

.

(6.2)

By projecting 32-bit wights and activations into low bit cases, the computation source

will be reduced to a great deal. For extreme cases, binarizing weights and activations of

neural networks decreases the storage and computation cost by 32× and 64×, respectively.

Considering the binarization process of BNNs, Eqs. 6.34 and 6.79 are relaxed into

PRB(x) =



1,

x < 0

+1,

0x , s.t. B = {−1, +1} ,

(6.3)

where we set a1 =1 and a2 =+1. Then PRB(·) is equivalent to the sign function i.e.,

sign(·).

The learning objective of conventional BNNs (XNOR-Net) is defined to minimize the

geometry distance between x and PRB(x) as

arg min

x,α

xαPRB(x)2

2,

(6.4)

where α is an auxiliary scale factor. In recent works of binarized neural networks (BNNs)

[199, 159], they explicitly solve the objective as

α =

x1

size(x),

(6.5)

where size(x) denotes the number of elements in x. However, this objective is insufficient to

maintain the information of the real-valued counterpart x. To overcome this shortcoming,

we introduce the kernel refining convolution.

Furthermore, XNOR-Net, which aligns with most BNNs, leads to intrachannel feature

homogenization, thus causing degradation of feature representation capacity. Hence, a new

feature refinement method should be introduced.

6.2.2

Kernel Refining Generative Adversarial Learning (KR-GAL)

Given a conventional CNN model, we denote wiRni and aiRmi as its weights and

feature maps in the i-th layer, where ni = Ci · Ci1 · Ki · Ki and mi = Ci · Wi · Hi. Ci

represents the number of output channels of the i-th layer. (Wi, Hi) are the width and

height of the feature maps and Ki is the kernel size. Then we have the following.

ai = ai1wi,

(6.6)

whereis the convolutional operation. As mentioned above, the BNN model aims to

binarize wi and ai into PRB(wi) and PRB(ai). For simplification, in this chapter, we

denote PRB(wi) and PRB(ai) as bwi Bmi and bai Bni in this chapter, respectively.